10 research outputs found
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
Extensions of graphical models with applications in genetics and genomics
De levende cel is een complex systeem van interacterende moleculen, waarin genen gekopieerd worden naar RNA's en vertaald in eiwitten. De meeste biologische karakteristieken komen voort uit complexe interacties tussen de talrijke componenten van een cel. Een belangrijke uitdaging voor de biologie is daarom het begrijpen van de structuur en de dynamica van het complexe inter- en intra-cellulaire web van interacties die bijdragen aan de structuur en de werking van een levende cel. %Een belangrijke uitdaging voor de biologie is daarom het begrijpen van de structuur en de dynamica van het complexe web van interacties, tussen en binnen de cellen, die bijdragen aan de structuur en de werking van een levende cel. Het gedrag van de meeste complexe systemen, van een cel tot Internet, komt voort uit de activiteit van vele componenten die paarsgewijs op elkaar inwerken. Op een abstract niveau kunnen deze componenten gerepresenteerd worden door een reeks knopen die met elkaar verbonden zijn door takken, waar elke tak de interactie tussen twee componenten laat zien. De knopen en takken samen vormen een netwerk, of, in formelere taal, een graaf. De doelstellingen van dit werk waren het uitbreiden van grafische modellen voor verschillende datastructuren en het vergroten van de toepasbaarheid van grafische modellen in diverse gebieden, in het bijzonder in systeemgenetica. In dit proefschrift hebben we een methode ontwikkeld, gebaseerd op ongerichte grafische modellen, om directe relaties tussen componenten van een systeem af te leiden. Daarnaast hebben we grafische modellen uitgebreid tot hoogdimensionale tijdseriedata met een niet-Gaussische structuur, waarbij we gerichte en ongerichte grafische modellen hebben gecombineerd om dynamische en gelijktijdige interacties te onderzoeken. We hebben de voorgestelde methoden ge"implementeerd als gebruiksvriendelijke software, genaamd netgwas, en tsnetwork welke vrij toegankelijk is voor gebruikers
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
A Spatial Autoregressive Graphical Model with Applications in Intercropping
Within the statistical literature, there is a lack of methods that allow for
asymmetric multivariate spatial effects to model relations underlying complex
spatial phenomena. Intercropping is one such phenomenon. In this ancient
agricultural practice multiple crop species or varieties are cultivated
together in close proximity and are subject to mutual competition. To properly
analyse such a system, it is necessary to account for both within- and
between-plot effects, where between-plot effects are asymmetric. Building on
the multivariate spatial autoregressive model and the Gaussian graphical model,
the proposed method takes asymmetric spatial relations into account, thereby
removing some of the limiting factors of spatial analyses and giving
researchers a better indication of the existence and extend of spatial
relationships. Using a Bayesian-estimation framework, the model shows promising
results in the simulation study. The model is applied on intercropping data
consisting of Belgian endive and beetroot, illustrating the usage of the
proposed methodology. An R package containing the proposed methodology can be
found on https:// CRAN.R-project.org/package=SAGM
Reconstruction of Networks with Direct and Indirect Genetic Effects
Genetic variance of a phenotypic trait can originate from direct genetic effects, or from indirect effects, i.e., through genetic effects on other traits, affecting the trait of interest. This distinction is often of great importance, for example, when trying to improve crop yield and simultaneously control plant height. As suggested by Sewall Wright, assessing contributions of direct and indirect effects requires knowledge of (1) the presence or absence of direct genetic effects on each trait, and (2) the functional relationships between the traits. Because experimental validation of such relationships is often unfeasible, it is increasingly common to reconstruct them using causal inference methods. However, most current methods require all genetic variance to be explained by a small number of quantitative trait loci (QTL) with fixed effects. Only a few authors have considered the “missing heritability” case, where contributions of many undetectable QTL are modeled with random effects. Usually, these are treated as nuisance terms that need to be eliminated by taking residuals from a multi-trait mixed model (MTM). But fitting such an MTM is challenging, and it is impossible to infer the presence of direct genetic effects. Here, we propose an alternative strategy, where genetic effects are formally included in the graph. This has important advantages: (1) genetic effects can be directly incorporated in causal inference, implemented via our PCgen algorithm, which can analyze many more traits; and (2) we can test the existence of direct genetic effects, and improve the orientation of edges between traits. Finally, we show that reconstruction is much more accurate if individual plant or plot data are used, instead of genotypic means. We have implemented the PCgen-algorithm in the R-package pcgen.</p
Natural variation in salt-induced root growth phases and their contribution to root architecture plasticity
The root system architecture of a plant changes during salt stress exposure. Different accessions of Arabidopsis thaliana have adopted different strategies in remodelling their root architecture during salt stress. Salt induces a multiphase growth response in roots, consisting of a stop phase, quiescent phase, recovery phase and eventually a new level of homoeostasis. We explored natural variation in the length of and growth rate during these phases in both main and lateral roots and find that some accessions lack the quiescent phase. Using mathematical models and correlation-based network, allowed us to correlate dynamic traits to overall root architecture and discover that both the main root growth rate during homoeostasis and lateral root appearance are the strongest determinants of overall root architecture. In addition, this approach revealed a trade-off between investing in main or lateral root length during salt stress. By studying natural variation in high-resolution temporal root growth using mathematical modelling, we gained new insights in the interactions between dynamic root growth traits and we identified key traits that modulate overall root architecture during salt stress
Dietary Intakes of Vegetable Protein, Folate, and Vitamins B-6 and B-12 Are Partially Correlated with Physical Functioning of Dutch Older Adults Using Copula Graphical Models
Background: In nutritional epidemiology, dealing with confounding and complex internutrient relations are major challenges. An often-used approach is dietary pattern analyses, such as principal component analysis, to deal with internutrient correlations, and to more closely resemble the true way nutrients are consumed. However, despite these improvements, these approaches still require subjective decisions in the preselection of food groups. Moreover, they do not make efficient use of multivariate dietary data, because they detect only marginal associations. We propose the use of copula graphical models (CGMs) to model and make statistical inferences regarding complex associations among variables in multivariate data, where associations between all variables can be learned simultaneously. Objective: We aimed to reconstruct nutritional intake and physical functioning networks in Dutch older adults by applying a CGM. Methods: We addressed this issue by uncovering the pairwise associations between variables while correcting for the effect of remaining variables. More specifically, we used a CGM to infer the precision matrix, which contains all the conditional independence relations between nodes in the graph. The nonzero elements of the precision matrix indicate the presence of a direct association. We applied this method to reconstruct nutrient-physical functioning networks from the combined data of 4 studies (Nu-Age, ProMuscle, ProMO, and V-Fit, total n = 662, mean ± SD age = 75 ± 7 y). The method was implemented in the R package nutriNetwork which is freely available at https://cran.r-project.org/web/packages/nutriNetwork. Results: Greater intakes of vegetable protein and vitamin B-6 were partially correlated with higher scores on the total Short Physical Performance Battery (SPPB) and the chair rise test. Greater intakes of vitamin B-12 and folate were partially correlated with higher scores on the chair rise test and the total SPPB, respectively. Conclusions: We determined that vegetable protein, vitamin B-6, folate, and vitamin B-12 intakes are partially correlated with improved functional outcome measurements in Dutch older adults.</p
Dietary Intakes of Vegetable Protein, Folate, and Vitamins B-6 and B-12 Are Partially Correlated with Physical Functioning of Dutch Older Adults Using Copula Graphical Models
Background: In nutritional epidemiology, dealing with confounding and complex internutrient relations are major challenges. An often-used approach is dietary pattern analyses, such as principal component analysis, to deal with internutrient correlations, and to more closely resemble the true way nutrients are consumed. However, despite these improvements, these approaches still require subjective decisions in the preselection of food groups. Moreover, they do not make efficient use of multivariate dietary data, because they detect only marginal associations. We propose the use of copula graphical models (CGMs) to model and make statistical inferences regarding complex associations among variables in multivariate data, where associations between all variables can be learned simultaneously. Objective: We aimed to reconstruct nutritional intake and physical functioning networks in Dutch older adults by applying a CGM. Methods: We addressed this issue by uncovering the pairwise associations between variables while correcting for the effect of remaining variables. More specifically, we used a CGM to infer the precision matrix, which contains all the conditional independence relations between nodes in the graph. The nonzero elements of the precision matrix indicate the presence of a direct association. We applied this method to reconstruct nutrient-physical functioning networks from the combined data of 4 studies (Nu-Age, ProMuscle, ProMO, and V-Fit, total n = 662, mean ± SD age = 75 ± 7 y). The method was implemented in the R package nutriNetwork which is freely available at https://cran.r-project.org/web/packages/nutriNetwork. Results: Greater intakes of vegetable protein and vitamin B-6 were partially correlated with higher scores on the total Short Physical Performance Battery (SPPB) and the chair rise test. Greater intakes of vitamin B-12 and folate were partially correlated with higher scores on the chair rise test and the total SPPB, respectively. Conclusions: We determined that vegetable protein, vitamin B-6, folate, and vitamin B-12 intakes are partially correlated with improved functional outcome measurements in Dutch older adults.</p